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Simoltaneous Factor Analysis in Several Populations 



Abstract 

This paper is concerned with the study of similarities and differences 
in factor structu:res between different groups. A common situation is when 
a battery of tests has been administered to samples of examinees from several 
populations . 

• A very general model is presented^ in which any parameter in the factor 
analysis models (factor loadings^ factor variances^ factor co*variances^ and 
unique variances) for the different groups may be assigned an arbitrary value 
or constra:_ned to be equal to some other parameter. Given such a specifica- 
tion, the model is estimated by the maximum likelihood method yielding a 
2 

large sample X of goodness of fit. By computing several solutions under 
different specifications one can test various hypotheses. 

The method is capable of dealing with any degree of invariance, from 
the one extreme, where nothing is invariant^ to the other extreme, where 
everything is invariant. Neither the number of tests nor the number of 
common factors need to be the same for all groups, but to be at all 
interesting, it is assumed that there is a common core of tests in each 
battery that is the same or at least content-wise comparable. 







Simultaneous Factor Analysis in Several Populations* 



1 * Introduction and Siommary 

This paper is concerned %^^ith the study of similarities and differences 
in factor structures between different groups- A common situation is when a 
battery of tests has been administered to sai.^._es of examinees from several 
populations. Traditionally this type of problem has been solved by obtaining 
orthogonal unrotated solutions for each group separately^ rotating these to 
similarity and examining various similarity indices. 

Perhaps the best approach to the problem is that of Meredith [ 196^4-a^b] 
who has shown that_, under certain conditions^ when the various populations 
are derivable as subpopulations from a parent population under selection 
on some external variable^ there is a factor pattern that is invariant over 
populations. Meredith [1964' ] gives two methods for estimating the common 
factor pattern by least squa^^es rotation of independent orthogonal solutions 
for each group into a common factor pattern. If this can be achieved_, the 
common factor pattern may be rotated further^ orthogonally or obliquely^ to 
a more readily interpret able solution. 

The method to be presented is both more general and statisn:ically more 
optimal. It is more general in several respects. Firstly^ the method may 
be used regardless of whether the populations are derived by selection or 
not. The only requirement is that the populations be clearly defined and 
the samples independent. Secondly^ the method is capable of dealing with 

*This research was supported by grant NSF-GB-12959 f^om National 
Science Foundation. thanks are due to Marielle van Thillo who checked 

the mathematical derivations and wrote and debugged the computer program 
SIFASP. 
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any degree of invariance^ from the one extreme^ where nothing is invariant^ 
to the other extreme^ where everything is invariant* Thirdly^ neither the 
number of tests nor the mimber of common factors need to he the same for all 
groi:^s^ but to be at all interesting it is assumed that there is a common core 
of tests in each battery that is the same or at least content -wise comparable. 

A very general model is presented^ in which any parameter in the factor 
analysis models (factor loading^ factor variance^ factor covariance^ and 
unique variance) for the different populations may be assigned an arbitrary 
value or constrained to be equal to some other parameter. Given such a speci- 
fication^ the model is estimated by the maximum likelihood method assuming 

the observed variables to have a multi normal distribution in each population. 

2 

This yields a large sample X test of the goodness of fit of the overall 
model. By computing several solutions under different specifications one 
can test various hypotheses. For example^ one can test the hypothesis of an 
invariant factor pattern oir the hypothesis of an invariant spec^fn^^ ImpT.e 
structure factor pattern. 

2. A General Model 

2.1 The Model 

Consider a set o m popvi.i ations * * **^^m * These may be differe^.it 

uat7ions_, or culturally different groups^ groups of individuals selected or the 
basis of some known or unknown selection variable^ groups receiving differ:= 5 rLt 
treatments^ etc. In fact^ they tmay be any set of exclusive groups of ind:' 'ii-duals 
t ^at are clearly lefined. It is assumed that a battery of tes 's has been 
l :tered to a sample of individuals from each popiolation. The battery of r:ssts 
need not be the same for each group^ nor need the number of tests be the same. 

4 
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However^ since ve shall be concerned -with characteristics of the tests that 

are invariant over populations^ it is necessary that some of the tests in 

each battery are the same or at least content-wise equivalent - 

Let p be the number of tests administered to group g and let x 
g S 

be a vector of order p y representing the measurements obtained in group 

S 

g - We regard x as a random vector with mean vector \i and variance - 
g S 

covariance E . It is assumed that a factor analysis model holds in each 
g 

population so that x can be accounted for by k common factors f and 

g g g 

p unique factors z y as 
g g 



( 1 ) 



X +Af +z 

g g g g g 



vith e(f ) = 0 and Sfz ) = 0 and A a factor pattern of order p^ x k 
' g‘ ^ g g g § 

The usual factor analytic assumptions then iirply that 



V- J 



Z = A 0 A* + \lf'^ ^ 

g g g g g 



vhere 0 is the variance -covariance matrix of f and ^ is the diagonal 
g g g 

variance -covariance matrix of z 

g 

In addition to assuming that a factor analytic model holds in each 
population the model may specify that certain parameters in A y y > 

goo 

g = 1^2j ...^m have assigned values and that some set of unknown elements in 

A , ^ and are the same for all g • The most common situation is 

g g g 

when the same battery has been administered to each group and when the 

whole factor pattern A is assumed to be invariant over groups. This 

g 

case will be considered separately in section 5* 
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2,2 Identification of Parameters 

Before an attempt is made to estimate a model of this kind, the iden- 
tification problem must be examined. The identification problem depends 
on the specification of fixed, free and constrained parameters. Under a 

given specification, each A , ^ and generates one and only one 

S & o 

Z but it is well known that different A and $ can generate the same 

g ® ^ -1 

Z^ . It should be noted that if is replaced by 

T <J> T * , where T is an arbitrai'y nonsingular matrix of order ^ ^ 

g g g g & & 

then E is lonchanged. Since T has k independent elements, this 
g g & 

suggests that independent conditions should be iti:posed on and/or 

m 2 . ^ . 

^ to make these loniquely defined and hence that S k independent condi- 
g g=l ^ 

tions altogether should be imposed. However, when eq,uality constraints over 
groups are taken into account, all the elements of all tne transformation 
matrices are not independent of each other and therefore a lesser number of 
conditions need to be imposed. It is hard to give further specific rules in 
the general case. For the special case when the whole factor pattern is 
invariant over groups, however, a more precise consideration of the iden- 
tification problem is given in section 3.2, In -other cases one should 
verify that the only transformations that preser^z-e the 

specification about fixed, free and constrained parameters are identity 
matrices • 

2,3 Estimation and Testing of the Model 

th 

Let N he the number of individuals in the sample from the g popu- 
g 

lation and let x be the usual sample mean vector and S the usual sample 
g S 



b 




-5- 



v^xiance -covariance ma'trix wi*th n = N - 1 degrees of freedom. The only 

S S 

requirement for the satiipling procedure is that it produces independent 

measurements for the different groups. 

If we assume that x has a multinormal distribution it follows that 

g 

S has a Wishart distribution based on L and n degrees of freedom, 
g g g 

The logai’ithm of the likelihood for the g^^ sampJ.e is 
(5) log Lg = iig[log|Lg| + tr(SgZ'l)] 

Since the samples are independent^ the log-likelihood for all the samples is 

m 

(M log Jj - Z log L 

g=i ® 

Maximum likelihood estimates of the unknown elements in ^ ’ 

g = l_,2_,..._,m y may be obtained by maximizing log L • However^ it is 
slightly more convenient to minimize 

m -j 

(5) ^ ^ ^ ^ 

g=l 

instead. At the minimum, F equals minus the logarithm of the likelihood 

ratio for testing the hypothesis implied by the model against the general 

alternative that each Z is unconstrained. Therefore, twice the minimum 

g " 

2 

value of F is approximately distributed, in large samples, as X with 
degrees of freedom equal to 



/ 
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(6) (3^ = 2 i Pp-(P„ + 1) - ■t 

g=l ® ® 

where t is the total number of independent parameters estimated in the 
model. 



2,4 Minimization Procedure 

The function F will be minimized numerica].ly with respect to the 
independent parameters using a modification of the method of Fletcher 
and Powell [ 1963 ]- The application of this method makes use of exact 
expressions for first-order derivatives and approximate expressions for 
second-order derivatives of F • 

Let 



(7) 



ft = 



g 



S 

g g 



g = lj2, . ..,m . 



Then it follows from the corresponding results for a single population 
(see e.g.^ Lawley & Max^Arell^ 1963^ Chapter 6 or JOreskog^ 19^9 ) that 
for g 1^2, . . ,^m ^ 



(8a) 

(8b) 



(8c) 



bF/bO^ 

bF/b^^ 



n 9. A ^ y 
g g g g 



n A^9 A 
g g g g 




diag(A'Q A ) 
g g g 



n 

g 



diag(ft i ) 

O & 



We 



shall also need expressions for S(3 F/30.30.) y where 0. and 

r n IL 



are any two parameters. If 0. is an element of A , 

1 g 



or \Jf and 
g 



is an element of 






3>h or , 



g 



^ h y S(3 F/bO^bO .) is zero. 



e . 

D 

e . 
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Otherwise^, if both 0. and G. are elements of A , ^ or 

1 D S g g 

required second-order derivatives can be e:cpressed in terms of the 
of 



(9a) 


1 = Z"^A 




(9b) 


T] = Z 




(9c) 


a = A' Z~^"A = A' § 




(9d) 


p = OA’Z"^ ■ = 




(9e) 


7 = 0A'z”^A$ = 




as 






(lOa) 


e(d^F/BA. dA ) = 
ir 


+ lislar’ 


(10b) 


g(3^F/SA. 3<J> ) = 

^ ' ir St'' 


(n/2)(2 - + Sit^rs^ 


(10c) 


g(3^F/SA. bt..) = 

ir 


2na^^-n . f. . 
D^‘ DO 


(lOd) 


g(d^F/S<J> S<J> . ) = 
' rs tu'^ 


ra/4)(2 - 6 )(2 - 5^ )(a ,a + a a ^ 

rs ' tu rt su ru st 


(lOe) 


cr3^F/3<J> d^\ .) - 

rs jj' 


n(2 - 6 ^ . f. . 

rs'^jr^^s jj 


(lOf) 


^ ' 11 Jj' 


2n(a^^)^?/f'. .^/r. . 

^ 11 JO 



^ the 
elements 



) 



Here ve have omitted the subscript g for simplicity of notation. 

The function F is regarded as a function of the elements of A . 
^ ^g ^ g = 1^2^,,. ^Tti ^ and is to be minimized with respect to these 
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■taking in'to account that some elements may be fixed and some may be con- 
strained to be equal to others. Such a minimization problem may be solved 
as follows. 

Let G be a vector of all the elements in A ^ ^ and ^ ar- 

g S & & 

ranged in a prescribed order. Since ^ is symmetric^ only the elements in 
the lower half and the diagonal are coimted. Then 0 is of order r = 

B & 

pk +^k(k +l)+p . Let ©' = (el , 6', . . - , 6' ) • Then 0 consists 

•*^g g 2 g'- g ' 1’ 2’ ’ ra 

of all the elements of all the parameter matrices and is of order r = r^ + 

r^ + . . . + r . The f'onction 1 may now be regarded as a function f(©) 

2 m 

of 0^, ©g, ...j©^ , which is continuous and has continuous derivatives 

O 

8f/ 3©^ and 5 F/^e^he^ of first and second order^ except where any 
is singular. The totality of these derivatives is represented by a gradient 
vector 5 f/ 5© and a symmetric second order derivative matrix 5 F/505© ' . 

The vector 5 f/ 5© of order r is formed by arranging the elements of 
the derivative matrices (8a) -(8c) in the same order as the elements of Ag ^ 
3> and ^ , g = l^, 2, . . . ^ m in 0 . As an approximation to the r x r 

s s 

p 2 

matrix 8 f/ 8 © 80 ' we use C(8 F/ 8 © 80 ') which is of the form 

e( 8 ^F/ 8 ©^ 80 p 0 ... 0 

0 e( 8 ^F/ 8 © 280 ^) ... 0 

6 6 ... 

p 

where ^(oF/^6 S0 * ) is a symmetric matrix of order r x r formed by 
^ ' g g g g 

computing (lOa)-(lOf) and arranging these so that the order of rows and 
columns corresponds to the order of the parameters in 6 . 

o 



(11) e(8^F/8©8©’) = 



O 

ERIC 
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Now let some r - s of the 9's be fixed and denote the remaining 0's 

by , s < r . The function F is now regarded as a function 

G(jt) of ^ . Derivatives dG/Sjt and S(3 G/Sjtdjt*) are oD- 

tained ±rom 3 f/S& and s(d * ) by omitting rows and columns cor- 

responding to the fixed 0*s. Among ^2^ ’ ’ * ^ ^s there be some t 

distinct and independent parameters denoted ^ ^ t < s ^ so 

that each it. is equal to one and only one k. but possibly several it's 

be a matrix of order s x t with 

k. - = 0 otherwise. The function 
ID 

of the independent arguments 

(l5) * ) = K *e(S'^G/SjtSjt * )K 

Thus^ the first-order and expected second-order derivatives of H are simple 

sums of the corresponding derivatives of G • 

For the minimization of H(k) we use a modification of the method of 

Fletcher and Powell [1965] for which a computer program has been written by 

Gruvaeus and Jflreskog [1970] • This method makes use of a symmetric matrix 

E of order t x t ^ which is evaluated in each iteration. Initially E 

2 

is any positive definite matrix approximating the inverse of 3 h/3k:3k * . 

In subsequent iterations E is improved^ using information built up about 
the function so that ultimately E converges to an approximation of the 
inverse of at the minimum. If t is large^ the number of 



equal the same k . Let K = ^ 

elements k. . = 1 if it. = k . and 

G (or F ) is now a function H(k:) 
^ 1 ^ • • *y and we have 

(12) = K'(3G/djt) 



O 

ERIC 



11 



- 10 - 



iterations may be excessive but can be considerably decreased by the pro- 
vision of a good starting point for k and a good initial estimate of E . 

In principle^ a good initial estimate of E may be obtained by com- 
2 

puting S h/SkSk * at the starting point and then inverting this matrix. 
However^ in our problem^ the second-order derivatives are rather complicated 
and time-consuming to compute. Instead^ we therefore use estimates of the 
second-order derivatives provide i by ..ue inforrnation matrix 

(l^) 3 (S^H/Sk:Sk: ^ ) = S(Sh/Sk olT 3 k* ) 

In addition ~o being more easily eve^ uated^ this matrix als:o yields other 
valuable information. The inverse of S(3 h/5k3k * ) evaluated at the minimum 
minimum of H is an estimate of the variance -covariance of the estimated 
parameters ^ • This may be used to obtain standard errors of 

the estimated parameters. 

The starting point k may be chosen arbitrarily but the closer it is to 
the final solution the fewer iterations will be required to find the solution. 
The minimization method converges quadratically from an arbitrary starting 
point to a local minimum of the function. If several local minima exist 
there is no guarantee that the method will converge to the absolute minimum. 

2 . 5 Computer Program 

A computer program^ SIFASP^ that performs all the computations described 
in the previous sections has been written in EORTRAIT IV and a write-up for 
this is available [van Thillo 8c jfJreskog^ 19703* This program reads an ob- 
served covariance matrix or a correlation matrix and a vector of standard 
deviations for each group ^ a set of pattern matrices specifying the fixed^ 
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free and constrained parameters and a set of matrices of start values for 

the minimization. It then minimizes the function F as described in the 

previous section to obtain the maximum likelihood solution for each group. 

These are then printed together with residuals^ 1.3 differences between 

2 

observed and reproduced variances and covariances^ -End X measuni’e of 
overall fit. 

The computer program assumes that the number Oj/ vEraableE? and the number 
of common factors are the same for each group. This i.;: _io 1 c.E 3 of generality^ 
since it can always be achieved by the introduction of pijeudc^rariables and 
pseudofactors in some groups as follows. Each pseudco . oiable has unit ob- 
served variance^ zero observed covariances with ever: other variable_, zero 

factor loadings on each factor including the pseudofactors and unit unique 
variance. Each pseudofactor has unit variance and zero covariance with 
every other factor and pseudofactor. It is readily verified that such 
pseudo variables and pseudofactors have no effect on the likelihood function 
whatsoever. 

The observed variables may be rescaled initially as described in 
section 3.^. This is sometimes convenient when the observed variables have 
arbitrary units of measurements. In the special case of an invariant factor 
pattern^ as described in the next section_, the factors in the maximum likeli- 
hood solutions may be rescaled as shown ..in section 3.5 • 

The implementation of the minimization algorithm is simpler if all 
matrices are stored as singly subscripted arrays. This saves space^ since 
only the lower halves of symmetric matrices need to be stored^ and makes 
the program more efficient. The program makes use of a se3 of subroutines 
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for matrix algebra on matrices stored as singly subscripted arrays. A fur- 
ther important advantage with this technique is the f3.exibilioy in the choice , 
of m ^ p and k . Ihus^ in the same space as one can have m - 4 ^ p = 12 
and k = 5 one can also have m = 2 ^ p = 17 snd k=5 ov rr-l^ 
p = 2t and k = 12. 

The computer program works with one group ( m = 1 ) as well as with more 
groups. When m is one the model is the same as that of j8reskog [19^9] 
but it is now possible to handle not only fixed parameters but also equality 
constraints between parameters- The new program SIFASP^ therefore^ makes 
the old program RMLFA obsolete- SIFASP can handle many types of factor 
analytic solutions. 



A Model of Factorial Invariance 



5-1 The Model 

Perhaps the most common application of the method Just described will 
be the case when the same tests have been administered in each population 
and when it is hypothesized that the factor pattern A is invariant over 
populations- Meredith [I96^t-a] has shown that such a model will occur under 
certain conditions, when the populations ‘are subpopulations derived from 
a parent population by selection on some external variables. Although 
this model is a special case of the general model described in the previous 
section, it deserves a separate discussion. 

In this case p^ = Pg ~ • • • " “ P ^1 “ ^2 ~ ~ ^m ^ 

the matrices 2 and , g = 1, 2^, ...,m are all of the order p x p 

S S 

and the 0 , g = 1,2, ...,m are all of order k x k . The common factor 

S 

pattern A is of order p x k . The regression of x on f is [c.f. (l)]. 

& & 
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(15) 



X 



g 



u + Af + z 
g g g 



and the variance -covariance matrix Z is 

g 

(l6) Z = A' + 

g g 

In the special case of tiTO popiolations^ m = 2 ^ a stricter form of invariance 

was considered by Lawley and Maxwell [1965^ Chapter 3]* This requires not 

only the regression matrix A in (l5) to be invariant but also the vari- 

2 2 

ances about the regression^ i.e.^ "^1 ~ ^2 * This type of restriction 
can easily be incorporated using the general approach of the preceding 
section • 



3-^ Identification of Parameters 

Suppose that the A in (l6) is replaced by = AT ^ and each 

is replaced by ^ = TO T* ^ g = 1^2^...^m ^ v^here T is an arbitrax-y 
g g 

nonsingular matrix of order k x k • Then each Z remains the same so 

g 

that the function F in ( 5 ) is unaltered- Since the matrix T has k 

2 

independent elements^ this means that at least k independent conditions 
must be imposed on the parameters in A ^ make these 

uniquely defined - 

Within the framework of the general procedure of the previous section^, 

the most convenient way of doing this is to let all the 0 be free and 

g 

to fix one nonzero element and at least k - 1 zeros in each column of 
A - In an exploratory study one can fix exactly k - 1 zeros in almost 
arbitrary positions- For example one may choose zero loadings where one 
thinks there should be "small*' loadings in the factor pattern. The resulting 



lo 
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solution Ttiay 1- . otated further^ if desired^ to facilitate better inter- 

pretation. In a confirmatory study^ on the other hand^ the ro;siticns cf 
the fixed zeros_, "Which often exceed k - 1 in each column^ ^ re given a 
priori by an hypothesis and the resulting solution cannot be rotated 
without destroying the fixed zeros. 



3*3 Scaling of Factors 

The fixed nonzero loading in each column of A can have any value. 

This is only used to fix a scale for each factor that is common to all 
groups. When the maximum likelihood solution has been obtained, the factors 
may be rescaled so that their average variance is unity. This rescaling is 
obtained as follows. Let 



(IT) 

with n 



0 = 
m 



(l/n) 



m 

L 

g=l 



Z 

g=l 



n 

g 



and 



n ^ 
g g 






(18) 



D = (diag $) 



Then the rescaled solution is 

( 19 ) A* = AD"^ 

(20) $* = D® D j g = '1,2, ... . 

g g 

The matrix A* has zeros wherever A has zeros but the fixed nonzeros in 
A have changed their values. The weighted average of the is a cor- 

relation matrix- 
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3-h Scaling of Observed Variables 

Wlien the units of measurements in the different tests are arbitrary^ 
it is usually^ convenient^ though not necessary^ to rescale the observed 
variables^ before the factor analysis. Let 

m 

(21) S = (l/n) Z n S , 

g=l ® ® 

m 

with n = E n as before and let 
g=i ® 

(22) D = (diag S)“^/^ 

Then the variance -covariance matrices for the rescaled variables are 

(23) = DS D 

g g 

The weighted average of the S^- is a correlation matrix* The advantage 

g 

of this rescaling is that^ when combined with the rescaling of the factors of 
the previous section^ the factor loadings are of the same order of magnitude 
as usual when correlation matrices are analyzed and when factors are 
standardized to unit variances* This makes it easier to choose start 
values for the minimization (see section 3*5) ^-nd interpret the results - 
It should be pointed out that it is not permissible to standardize 
the variables in each group and to analyze the correlation matrices 
instead of the variance-covariance matrices. This violates the likelihood 
funcLjLon (i|-) which is based on the distribution of the observed variances 
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and covariances. Inv£:-riance of factor patterns is eii^ected to hold only 
when the standardizai^ion of both tests and factors are relaxed. 



3*5 Choice of Start Values 

In a medium'-sized study of say h groups, 20 variables and 5 factors, 
the number of free parameters to estimate may -well exceed 200. To obtain 
the maximum likelihood estimates, a function of over 200 variables has to 
be minimized. This is not an easy task even on today’s large computers. 
To reduce the computer time as much as possible it is necessary to choose 
good start values for the minimization. This can be done by doing some 
preliminary runs with the same computer program before the overall esti-- 
mation is attempted. 

1- Using m = 1 ana the pooled correlation matrix R = DSL , 

where D is given by (22), obtain an obliq,ue maximum likelihood 

solution with the fixed zeros in A anl the diagonal elements 

of 0 equal to unity. Let the estimate of A so obtained 
'^( 0 ) 

be denoted * 

2. For each group separately, using m - 1 and S* , obtain an 

S 

oblique maximum likelihood solution with the whole A fixed 



3 . 



- ( 0 ) 

equal to A and with <I> and Jc free. Let the resulting 

g g 

estimates be denoted and , g = 1,2^. ..,m . 

Then . . . , provide 

good start vaD^ues for the overall minimization with the largest 

) 

element -n each column of ' fixed, in addition to the fixed 



zeros . 
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It the model also specifies that the i|r should he invariant over 

s 

groups, one uses as start values for the common t , the estimate 

obtained in step 1 and step 2 is done with fixed at . 



3.6 Testing of Hypotheses and Strategy of Analysis 

Suppose Hq and represent two models under different specifica- 

tions of fixed, free and constrained parameters, both models fitting the 
general framework of section 2.1. Then it is possible, in large samples, 
to test the model against the model , by estimating each of them 

separately and comparing their goodness of fit values. The difference 

in X is asymptotically a X with degrees of freedom equal to the cor- 
responding difference in degrees of freedom. 

In an exploratory study there are various hypotheses that may be 
tested and it seems best to proceed stepwise in a certain order. 

One begins by testing the hypothesis of equality of covariance matrices, 

i.e.. 



(24) . 2^ 



z 

m 



This may be tested by using the test statistic 



m 

( 25 ) M = n log Is I - Z n logjs | . 

g=l S g 

where S is given by (21). Under the hypothesis, M is distributed 
approximately as X^ with d^ = | (m - l)p(p + i) degrees of freedom. As 
shown by Box [l9k-9'], the approximation to the X^ distribution is improved 
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if M is multiplied by a certain constant • VJhen p or m is larger than 
Box suggests a transformation to an P distribution. 

It should be noted that the test statistic ( 25 ) may be obtained in SIPASP 
by specifying k^”P^ ^ = g = 1^2^... ^,m and ^ 2 . ^ *** ~ 

• The maximum likelihood estimate of the common 0 will then be the pooled 
S as defined in (21). If the tests are scaled originally as described in 
section 5*^^ this S is a. correlation matrix R • 

If the hypothesis is found to be tenable every characteristic common to 
all groups can be obtained from the pooled covariance matrix S or the cor- 
relation matrix R and there is no need to analyze each group separately or 
s itmlt ane o us ly . 

If^ on the other hand^ the hypothesis of equality of covariance matrices 
is untenable^ one may want to investigate similarities and difference in fac- 
tor structures. For this purpose^ a sequence of hypotheses^ such that each 
hypothesis is a special case of the preceding^ will now be considered. The 
first hypothesis is the hypothesis of equality of number of common factors ^ 
i.e . ^ 



(26) = kg 



^m ~ ^ specified number k 



This may be tested by doing an unrestricted factor analysis [j8reskog_, 19^9] 

on each S (or or the corresponding correlation matrix) separately^ 

& S 

using the same number of common factors for each group. The analyses may 

be done by J5reskog* s [l967a^b] method UMLPA but can also bo done with the 

2 

computer program SIPASP. In SIPASP one uses m - 1 and fixes k elements 
in and/or ; for example^ to obtain an orthogonal solution one can 
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choose 0 = I and ^ k(k - l) zeros in A and to obtain an oblique solu- 

g 2 g 

tion one can choose diag - I and k(k - l) zeros in . Each analysis 

gives a with |- [ (p - k)^ - (p + k)] degrees of freedom. Since these 

2 .22 
X ’ s are independent they may be added for each group to obtain a X , X,_ 

1 2 

say^ *wlth ^ in[ (p - k) - (p + k)] degrees cf* f^eedom^ which may be 

used to test the overall hypothesis- 

If the hypothesis of a common nutriber of factors is found tenable^ one 
may proceed to test the hypothesis of an invariant factor pattern^ i.e.^ 

( 27 ) = ... . . 

The common factor pattern A may either be completely unspecified or be 

specified to have zeros in certain positions. If A is unspecified^ one 

fixes k - 1 zeros and one nonzero value in each column almost arbitrarily- 

If A is specified to have zeros in certain positions^ one fixes an arbitrary 

2 

nonzero element in each column in addition. There will then be k fixed 

2 . 

elements in A in the unspecified case and q > k in the specified case- 
2 

To obtain a X for this hypothesis^ one estimates A ^ ^2*’ * * *''^m ^ 

^l'^^2'^ * * **^^m simultaneously^ yielding a minimum value of 

2 2 

the function F . Twice this minimum value is a X ^ X^ say_, with degrees 
of freedom 

^mp(p + l) -pk+q - ^ mk(k + l) - mp , 

2 

where q = k in the unspecified case. To test the hypothesis , given 

2 2 2 

that H, holds, one uses X^ , = X. - X, with d. , = d. - d. degrees of 

X ^ A*kAk A*kAk 

freedom. 
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If* tliis hypothesis is found tenable one Tnay proceed to test the hypothesis 



ERIC 



( 28 ) 



H 






- . +2 . 



To do so one has to estimate A ^ \|r under • This again 

2 

gives a minimum value of F which when multiplied by two gives with 



" i" + 1) - pk + q - mk(k + l) 



2 2 2 

degrees of freedom. To test against one uses =■' ~ 

with d, A = d., - d. degrees of freedom. 

\|fA Ay a 

If the hypothesis is found tenable one may want to test the 

hypothesis 



(29) 



H 






\ = A = 



= ...=A ; $ =$ =...=® 

m 1 2 m 



^ ^2 = 



= 



m 



This hypothesis is included in but is stronger than since 

includes also the cases when the common S is not of the form 



(50) 



Z = ADA’ + t 



This hypothesis can be tested directly on the basis of the pooled S 

2 

in (2l). The test o±' against uses a X with 



‘^ADf s = i p(p + 1) - + '1 - i 

degrees of freedom. 

Various other types of hypotheses may also be tested. For example^ 
one may assume that some factors are orthogonal and some are oblique (see 
JGreskog^ 1969)* 
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2 

It should be emphasized that even if a X is significant^ there may 
still be reasons to consider the model* After all^ the basic model with its 
assumptions of linearity and normality is only regarded as an approximation 
to reality- The true population covariance matrix will not in general be 
exactly of the form specified by the hypothesis^ but there will be discrep- 
ancies between the true population covariance matrix and the formal model 

postulated. These discrepancies will not get smaller when the sample size 

2 

increases but will tend to give large X values. Therefore;, a model may 

p 

well be accepted even though X is large. Whether to accept or reject a 
model cannot be decided on a purely statistical basis. This is largely a 
matter of the experimenter's interpretations of the data^ based on sub- 
stantive theoretical and conceptual considerations- Ultimately the criteria 
for goodness of the model depends on the usefulness of it and the results 
it produces. 

3.7 A Numerical Illustration 

To illustrate the methods previously discussed we use the same data as 
Meredith \_lS6h'h~\ used to illustrate his rotational procedure. The data con- 
sist of nine tests selected from a battery of 26 psychological tests de- 
scribed by Holzinger and Swineford [ 1959 ]* The tests were administered to 
7th and 8th grade children in two schools^ the Pasteur and the Grant -White 
Schools in the Chicago area. The nine tests were selected so that each of 
the three factors --space^ verbal and memory - -would be represented by three 
tests. The nine tests used^ with their original code numbers in parentheses;, 
were: Visual Perception (l)^ Cubes (2)^ Paper Form Board (5)^ General 

Information (5)^ Sentence Completion (j). Word Classification (8)^ Figure 
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Eecognition (l6)^ Object Number (l?) and Number -Figure (iS). On the basis 

of a speeded addition test^ Meredith divided each of the samples from the 

two schools into two approximately equal groups by splitting at the median 

score within each school. This yielded four groups that will be used for 

this illustration. The correlation matrices taken from Meredith's table 2 

are shown in Table la with unsealed and scaled standard deviations in 

Table lb- The sample sizes are: Group 1: Pasteur Low = 77 ^ Group 2: 

Pasteur High ~ 79 ^ Group 5: Grant-Tfcite Low = 7^ Group h: 

Grant -White High = 71 . Because of the way the two groups within schools 

were selected^ it is doubtful that the assumption of multinormality is valid. 

This departure from multinormality -^rill have no great effect on the estimates 

2 2 

but may be more serious for the X values- In particular^ the X test of 
is known to be sensitive to departures from multinormality. For this 

2-1 

2 

reason and also because the sample sizes are relatively stnall^ the X values 
that will be reported should be interpreted very cautiously. It should be 
emphasized that these data have been chosen merely to illustrate the proce- 
dures of this paper. Another application^ with more substantive interest 
and with larger and widely varying sample sizes^ are given by McGaw and 
jBreskog [ 1970 ]- 

~ Z_ = Et . This 

gives the test statistic M = 1^6.95 with 155 degrees of freedom. Trans- 
formation to Box’s F -statistic gives oo ~ 1-05 • In view of the 

remark Just made^ this value is inconclusive. However^ for the purpose of 
illustrating a simultaneous analysis of all four scaled dispersion matrices^ 
we shall follow the procedure of section 5*6 and test various hypotheses 
of interest. The results are summarized in Table 2. 



We begin by testing the hypothesis that 
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- , is that three factors adequately reproduce 
5 

the correlation in . each population* This gives X = 47*73 ^ith 48 degrees 

2 2 

of freedom* This X is the sum of four X * s one from each population and 

2 2 

each with 12 degrees of freedom. These are X^ = 15 *33 ^ ^2 ~ ; 

2 2 

X^ == 14*40 and X - 7*56 . Thus we cannot reject the hypothesis that the 
3 4 

number of factors is three for each population. We therefore proceed by 

investigating whether there is an invariant factor pattern or not. 

The next hypothesis^ ^ is that there is an inr'.iz±ant unspecified 

u 

(unrestricted) factor pattern • To test this hypoth~;3i,s we fix one 

nonzero element and two zero elements in each column of v and leave 

u 

<&1 y and ^ ^ ^ complete. :.y unconstrained. 

A convenient way to choose the fixed elements in 1.: to use a reference 

variables solution as^ for example. 



The first hypothesis, 



(51) 



1 


0 


0 


X 


X 


X 


X 


X 


X 


0 


1 


0 


X 


X 


X 


X 


X 


X 


0 


0 


1 


X 


X 


X 


X 

- 


X 


X 



Here the zeros and ones stand for fixed values and x' s for parameters to be 

estimated. Tests 1, h and 7 have been chosen to be pure in their respective 

2 

factors. The test of gives X =90.57 with 102 degrees of freedom, 

u 

which is not significant. Thus, we cannot reject the hypothesis that there 
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is an invariant factor pattern with three factors. To strengthen the model 
we now make use of our knowledge about the tests and hypothesize that the 
invariant factor pattern has a specific form, namely 



( 52 ) 



A = 



1 0 
X 0 
X 0 
0 1 
0 X 
0 X 
0 0 
0 0 
0 0 



0 

0 

0 

0 

0 

0 

1 

X 

X 






i.e.^ "we assume that A has a nonoverlapping group structure^ where the first 

three tests are loaded on the first factor only^ the next three tests on the 

second factor only and the last three tests on the third factor only. As 

before^ we put no constraints on the s and the V’s. A test of this hypothesis 
2 

gives X = 151«24 with ll4 degrees of freedom. This has a probability level 
of about 0*15- Thus we cannot reject the hypothesis that the invariant factor 
pattern is of the specified form- An examination of the s^ in relation to 
their standard errors in the solution under , revealed that many of these 
were not sufficiently different to be considered different. This suggests 
that one should also examine the hypothesis ^ that a stricter form of 

invariance holds, namely where also the ilr’ s are the same for all populations. 

A test of gives X = 172.14 with l4l degrees of freedom. This is just 

significant at the 5^ leve2,. The maximum likelihood solution under is 

shown in Table 5* Finally, to conplete the sequence of hypotheses we consider 
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the hypothesis that the whole factor strucioure is invariant^ with the 

same factor pattern A as before. This gives = 212.80 with I59 degrees 

of freedom which is hi^ly significant. This wciuld seem to contradict the test 

"2^ • This is not so^ however^ since ^ misch more restrictive 

hypothesis than . The ^cypothesis requires that the common Z has 

a factor structure with thr-^e common factors with :a, factor pattern of the 
restricted tj^e (32)^ but there may be many other possible representations of 
the common Z . In fact^ h. Z is represented by the unrestricted factor pat- 
tern in (31) instead^j >ae obtains = 36- 2f: with 1^7 degrees of freedom^ 

so '^hat cannot be rejected although was rejected. 

Altogether these results suggeso two alternative descriptions of the 
data. One is that the whole factor structure is invariant over populations 
with a three -factor solution of a fairly complex form. The other is to 
represent the tests in each population by three factors of a particularly 
simple form^ but these factors have different variance -covariance matrices 
in the different populations. Additional studies with larger sample sizes 
are needed to discriminate statistically between the two models. Perhaps^ 
the second alternative has the most intuitive appeal* Inspecting the factor 
variances in Table it is seen that for the Pasteur school they tend to be 
higher for the Low group than for the High group_, whereas for the Grant - 
White school generalJ.y the opposite holds. Also for the two High groups 
the variances are generally lower for the Pasteur school than for the Grant- 
White school. Note also the low covariance of O.O8 between S and M for 
the Low Grant -White group and the corresponding high covariance I.03 for the 
Low Pasteur group. This seems to indicate that the Low Pasteur group cannot 
■■i._Lly discriminate between the spacial and the memory tasks whereas the Low 
Grant -White group can do so clearly. 
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4- Z:-.'. Imatlon of Factor Means 



A stricter form of invariance than ( 15 ) is obtained if in (l) ve req^ulre 

that not only the factor matrix A bat also the vector u be invariant 

g g 

over populations . Then 



(33) 



X = jj, -f Af + z 
g g § 



la tnis case it is not reasonable to assume^ as before^ that S(f_) = 0 for 

all g y since this implies that S(x ) - \i for all g . Instead ve assume 

g 

that each f has its oxm mean e(f ) = v and propose to estimate these 

o g g 

^rr ' g = 1^2^«..^m . However^ since each f may be replaced by f + b 
& g g . 

If \i is replaced by |i - Ab , without changing x in (35)^ some rule is 

g 

necessary for fixing the origin of the f . It seems most convenient to 

s 

fix the origin such that 
m 

(34) E N V = O 

g=l ® ® 

To estimate n and v we assume that A , <I> , ^3 and hence also 

g g g 

E , are known and equal to their estimates and E . Let 

S . g g g 

X 3 as before, be the sample mean vector in group g and let x be the 
overall mean_, i • e . ^ 



(35) 



m 



X = (l/w) E N X 



g=l 



g g 



m 



with N = Z N . We take x to be an estimator of u • Then e'3,ch of the 
g=i ® 

following three estimators of v seem reasonable: 
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* 



f 



-» 






(36) 

(37) 

(38) 



0 = (£, - 2 ) 



V == 0 A'H Tic - 5:) 
g g S ■ { 

V = (A’£ “!4\'’£ ^(5c - x) 

g ^ g g ^ g ^ 



Ponnula (56) is ootain^c by fitting the theoretical means p. -f Av to 

g 

the observed means 5.^ . x: - 1,2^ .• .^m ^ by least squares* The advantage of 

£> 

this formula is that the w^aighting matrix in front of x^ - x is independent 
of g - 

Eormula (57) is obtained if one applies the mean vectors to the regression 
formula for correlated factor scores- 

Formula (5S) is the maximum likelihood estimator of v for given 

^ ^ , g = ly2y . . , and jj. = x . The latter is obtained from 

the minimization of 



m g 
S S 
g=i a=i 



( 



X, 



OS 



X 






X - Av ) 
g 






where x^^^ is the vector of observed test scores for person CC in group g 
Formula (36) satisfies (3^^) for the estimates, but (37) and (38) have 
to be scaled afterwards so that (3^+) holds. 
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TABI^ la-b 

Inter correlation Matrices (a) 







Group 


1 Above Main 


Diagonal 












Group 


2 Below Main 


Diagonal 












1 


2 


5 


4 


5 


6 


7 


8 


9 


Visual Perception 





• 52 


.48 


.28 


.26 


.40 


.42 


.12 


.25 


Cubes 


.24 


-- 


•55 


.01 


.01 


.26 


.52 


.05 


-.04 


Paper Form Board 


•25 


.22 




.06 


.01 


.10 


.22 


.05 


.01 


General Information 


• 52 


.05 


-23 




•75 


.60 


.15 


-.08 


-.05 


Sentence Completion 


• 55 


• 25 


.18 


.68 


— 


.65 


.07 


.06 


.10 


Word Classification 


• 56 


.10 


.11 


•59 


.66 


-- 


.56 


•19 


.24 


Figure Recognition 


.22 


.01 


-.07 


.09 


.11 


.12 


— 


.29 


.19 


Object -Number 


-.02 


-.01 


-.15 


.05 


.08 


.05 


.19 


— 


.58 


Numb er -Figur e 


-09 ■ 


-.14 


-.06 


.16 


.02 


.12 


• 15 


.29 


-- 




Group 


5 Above Main 


Diagonal 












Group 


h Below Main 


Diagonal 












1 


2 


5 


4 


5 


6 


7 


8 




Visual Perception 





• 54 


.41 


.58 


.40 


.42 


-55 


.16 


.55 


Cubes 


.52 


-- 


.21 


.52 


.16 


.15 


.27 


.01 


.27 


Paper Form Board 


.54 


• 18 


-- 


.51 


.24 


.55 


.50 


.09 


.09 


General Information 


.51 


.24 


•51 




.69 


• 55 


.17 


.51 


.54 


Sentence Completion 


.22 


• 1;6 


.29 


.62 


-- 


.65 


.20 


.50 


.27 


Word Classification 


.27 


.20 


-52 


.57 


.61 


— 


.51 


.54 


.27 


Figure Recognition 


.48 


• 51 


.52 


.18 


.20 


.29 


— 


.51 


.58 


Ob J ect -Numb er 


.20 


.01 


-15 


.06 


.19 


.15 


.56 


— 


.58 


Number-Figure 


.42 


.28 


.40 


.11 


.07 


.18 


.55 


.44 


— 



Standard Deviations (b ) 



Visual Perception 
Cubes 

Paper Form Board 
General Information 
S ent enc e C oxap 1 e t i on 
Word Classification 
Figure Recognition 
Object “Numb er 
Nixmb er “F ig or e 



Unsealed 



1 


2 


5 


4 


7.4 


6.7 


6.6 


7.2 


5.6 


4.0 


4.8 


4.0 


2.9 


2.8 


2.6 


5.0 


11.8 


11.0 


11.5 


11.5 


5.2 


5.2 


4.7 


4.5 


5.2 


5-5 


5.0 


5.5 


8.8 


7.6 


6.1 


7.4 


4.7 


5.2 


5.9 


4.9 


4.6 


4.4 


5.9 


4.7 



Scaled 



1 


2 


5 


4 


1.06 


0.96 


0.95 


1.05 


1.20 


0.86 


1.05 


0.86 


1.02 


0.99 


0.92 


1.06 


1.05 


0.96 


0.99 


1.01 


1.08 


1.06 


0.96 


0.91 


0.99 


1*01 


0.95 


1.05 


1.17 


1.01 


0.81 


0.98 


1.00 


InlO 


0.85 


l.o4 


l.O’r 


1.00 


0.88 


1.07 
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TABLE 2 

Summary of Analyses 



Hypo tines is 




No • par. 


d-f . 


P 




146-95 


45 


155 


0.25 


^=5 


47-75 


152 


48 


0.47 




90-57 


78 


102 


0.78 




151-24 


66 


U4 


0.15 


% 


172-14 


59 


l4l 


0.04 


^A$\(r 


212.80 


21 


159 


0.00 


^A 

U ^ 


56-20 


55 


l47 


1.00 



,v 
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TABIiE 5 

MaxiTTi-Utn Likelihood Solution under H 



A\[; 



(Asterisks Denote Parameter Values Specified by Hypothesis) 









A 




t 






s 


V 


M 






Visual Perception 


.72 


0^ 


0* 


.69 


Cubes 




.43 


0^ 


0^ 


.90 


Paper Form Board 


.51 




0* 


.86 


General Information 


0* 


.80 


0^ 


.60 


Sentence Completion 


0^ 


.85 


0^ 


.33 


Word Classification 


0* 


.75 


0* 


•67 


Figure 


Recognition 


0* 


0^ 


• 58 


.81 


Object 


-Number 


0^ 


0^ 


.48 


.88 


Humber -Figure 


0* 


0^ 


•57 


.83 




S V M 




s 


V 


M 




S 


"1.02 


1 


sfo.89 








= V 


0.53 0.91 


4>2 


= V 0.62 


0.93 






M 


J..03 0.36 1.30_ 


1 ^ 


mLo.59 


0.50 


0.58_ 






S V M 




s 


V 


M 




s| 


ro.72 1 




SPL.58 








$ = V 


0.52 1.06 


\ 


= V 0.42 


1.12 






^ mI 


Lp.08 0.20 O.90J 


4 


mLo-71 


0.27 


1.25_ 
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